首页> 外文OA文献 >Guaranteed Bounds for General Approximate Dynamic Programming
【2h】

Guaranteed Bounds for General Approximate Dynamic Programming

机译:一般近似动态规划的保证界

摘要

In this paper, we will develop a systematic approach to deriving guaranteedbounds for approximate dynamic programming (ADP) schemes in optimal controlproblems. Our approach is inspired by our recent results on bounding theperformance of greedy strategies in optimization of string-submodular functionsover a finite horizon. The approach is to derive a string-submodularoptimization problem, for which the optimal strategy is the optimal controlsolution and the greedy strategy is the ADP solution. Using this approach, weshow that any ADP solution achieves a performance that is at least a factor of$\beta$ of the performance of the optimal control solution, which satisfiesBellman's optimality principle. The factor $\beta$ depends on the specific ADPscheme, as we will explicitly characterize. To illustrate the applicability ofour bounding technique, we present examples of ADP schemes, including thepopular rollout method.
机译:在本文中,我们将开发一种系统的方法来为最优控制问题中的近似动态规划(ADP)方案推导保证边界。我们的方法受到我们最近的研究的启发,该研究的结果是在有限的范围内限制贪婪策略在优化字符串次模块函数中的性能。该方法是导出一个字符串次模优化问题,该问题的最优策略是最优控制解决方案,贪婪策略是ADP解决方案。使用这种方法,我们表明,任何ADP解决方案都可以达到至少满足最佳控制解决方案性能的\ beta $的性能,这满足了贝尔曼的最优性原则。正如我们将明确描述的那样,因子$ \ beta $取决于特定的ADP方案。为了说明边界技术的适用性,我们介绍了ADP方案的示例,包括受欢迎的推广方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号